Large-scale Controlled Vocabulary Indexing for Named Entities
نویسنده
چکیده
A large-scale controlled vocabulary indexing system is described. The system currently covers almost 70,000 named entity topics, and applies to documents from thousands of news publications. Topic definitions are built through substantially automated knowledge engineering.
منابع مشابه
Indexing and Comparison of Multi-Dimensional Entities in a Recommender System based on Ontological Approach
The paper describes an application of indexing—the technology currently widely used for processing and comparing textual information—to multi-dimensional entities of knowledge domains. We propose a model for building a frame-based ontology, which contains a domain conceptual model as well as a controlled vocabulary of “base terms” used for indexing. Further, the ontology constitutes the structu...
متن کاملOOV Sensitive Named-Entity Recognition in Speech
Named Entity Recognition (NER), an information extraction task, is typically applied to spoken documents by cascading a large vocabulary continuous speech recognizer (LVCSR) and a named entity tagger. Recognizing named entities in automatically decoded speech is difficult since LVCSR errors can confuse the tagger. This is especially true of out-of-vocabulary (OOV) words, which are often named e...
متن کاملBilingual Indexing for Information Retrieval with AUTINDEX
AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...
متن کاملAutomatic algorithm selection for MeSH Heading indexing based on meta-learning
We present a methodology that automatically selects indexing algorithms for each heading in MeSH, NLM’s vocabulary for indexing MEDLINE. While manually comparing indexing methods is manageable with a limited number of MeSH headings, a large number of them makes automation of this selection desirable. Results show that this process can be automated based on previously indexed MEDLINE records. We...
متن کاملBibliographic database access using free-text and controlled vocabulary: an evaluation
This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. Fir...
متن کامل